Intro

The RedHot analysis was completed to compare the health of western redcedar across urban heat islands. Redcedar health data were collected by community scientists in the Western Redcedar Dieback Map project on iNaturalist. Urban heat data were commissioned by three cities in the northwest using methods described in Voelkel and Shandas (2017).

Questions

Are income and tree size important effects for the vulnerability of redcedar to urban heat?

Methods

Data Descriptions

Observations of western redcedar were downloaded from iNaturalist and urban heat data were downloaded from open data portals or provided by contacts in the City of Tacoma, King County (Washington) and Portland. Trees in WA were also evaluated based on EHD Ranks. HOLC data were also investigated for each city.

iNaturalist

  • iNaturalist data were downloaded from the Western Redcedar Dieback Map on 1.13.24:
    • full data (query: quality_grade=any&identifications=any%projects%5B%5D=western-redcedar-dieback-map)
      • Note we needed to specify all fields related to place and all fields related to the project

Urban Heat

  • Urban Heat Island data were obtained for the following locations

Note temperature data is different for each dataset and may have been collected slightly differently. Temperature data will need to be standardized (difference from mean) for each dataset, then temperatures can be compared region wide.

Environmental Health Disparity Data

Environmental Health Disparity Rank Data (Full EHD Rank data (v2)) was downloaded on 6.17.23 from https://geo.wa.gov/datasets/WADOH::full-environmental-health-disparities-version-2-extract/explore

Metadata is available in this technical report: https://doh.wa.gov/sites/default/files/2022-07/311-011-EHD-Map-Tech-Report_0.pdf?uid=634dcf4aec2b5

  • EHD
    • Environmental Exposures
      • “Diesel_PM2”
      • “Ozone_Conc”
      • “PM2_5”
      • “Proximity_”
      • “Toxic_Rele”
    • Environmental Effects
      • “Lead_Risk_”
      • “PTSDFs”
      • “PNPL”
      • “PRMP”
      • “PWDIS”
    • Socioeconomic Factors
      • “LEP”
      • “No_HS_Dipl”
      • “POC”
      • “Poverty”
      • “Transporta”
      • “Unaffordab”
      • “Unemployed”
    • Sensitive Populations
      • “CVD”
      • “LBW”

HOLC Data

Home Owners Loan Corporation Maps were downloaded from the Mapping Inequality Project at the following links + Tacoma + Portland + Seattle

Data Wrangle

City tree data were ‘joined by attribute’ separately so we have 3 different tree datasets to work with or merge.

Also, given the UHI data were extracted with shapefiles, the column names are limited to 10characters. Therefore, exported UHI data only include iNat ID numbers and UHI data. These were then re-merged (see below) with the iNat data to get the remaining columns with proper names.

Import

Prep for QGIS Data Extracts

Filter data to only include necessary columns

[1] “id”
[23] “latitude”
[24] “longitude” [35] “place_town_name”
[36] “place_county_name”

Split Data into Areas

  • Note: Alternative option is to export iNat observations in these locations rather than export all observations than filter to these locations
    • Tacoma (query: quality_grade=any&identifications=any&place_id=186123&projects%5B%5D=western-redcedar-dieback-map)
    • Portland Metro Area (query: quality_grade=any&identifications=any&place_id=122420&projects%5B%5D=western-redcedar-dieback-map)
    • King County (query: quality_grade=any&identifications=any&place_id=1282&projects%5B%5D=western-redcedar-dieback-map)

Export data for QGIS data joins

Tacoma - 357 Trees Portland - 465 Trees King County - 516 Trees

QGIS Methods

  • Import into QGIS
    • Data Source Manager - Delimited Text - Browse to R working directory
Extract Heat Data for Trees
  • QGIS - Extract heat data for each point
    • Add Heat Data
      • Tacoma
        • tac_pm.tif
        • tac_am.tif
        • tac_af.tif
      • King County
        • pm_t_f_ranger.tiff
        • af_t_f_ranger.tiff
        • am_t_f_ranger.tiff
      • Portland
        • 825a_2 (am)
        • 825b_2 (af)
        • 825c_2 (pm)
    • Open processing tools panel and search for ‘sample raster values’
      • (View > Panels > processing tools)
      • Sample raster values for each temperature time series and each area (e.g. )
Extract HOLC Data
  • Add HOLC data to trees layers for each city
    • join attributes by location (Vector > Data Management Tools > Join Attributes By Location)
    • e.g. Join features in Portland Trees that Intersect by comparing to Portland HOLC 1.13.24
    • Advanced settings - do not filter (mark in in both layers) + May get warning: No spatial index exists.. + Right click each layer and click ‘Create Spatial index’ in Source Tab
    • Export temporary data for trees
Extract EHD Rank Data for Trees
  • Add EHD RAnk data downloaded on 6.17.23 from https://geo.wa.gov/datasets/WADOH::full-environmental-health-disparities-version-2-extract/explore
  • Add EHD Data to trees layer
    • join attributes by location (Vector > Data Management Tools > Join Attributes By Location)
    • Advanced settings - do not filter (mark in in both layers)
      • May get warning: No spatial index exists..
        • Right click each layer and click ‘Create Spatial index’ in Source Tab
    • Export temp data for trees

Community Hypothesis Test - Tree Selection

  • Random Tree Selection for Portland Redhot Hypothesis Test.
    • Define Project CRS WGS 84 EPSG:4326
    • Add Portland Urban Heat Data
      • GIS > Urban Heat GIS Data > Portland > a, b,c tiffs
      • Convert UHI Raster to shp files x 3
      • Raster > Conversion > Polygonize > Default Settings (DN_AM for morning temps (825a), DN_AF for afternoon etc.)
      • Export temporary ‘vectorized’ layers to shp files with default crs Create Spatial Index for each layer (avoids warning in below joining steps)
      • Right click each layer and click ‘Create Spatial index’ in Source Tab Extract temp data for trees
    • Add Street Tree Data
    • Join attributs by location
      • Vector > Data Management Tools > Join Attributes by location
      • Trees that intersect with vectorized rasters
      • Advanced settings - do not filter (mark in both layers)
    • Limit trees to “DBH” > 30 and “DBH” < 40? (leaves 160 trees)
    • Randomly select 100 (drops 60 trees)

Export final .shp files as .csv

Re-import Data after following QGIS Methods

Join UHI and iNat Data

Mutate

Note we needed to convert Tacoma temps to F to match king county

Daily Means
Morning Means
Afternoon Means
Evening Means

Merge Data (cities)

Clean

Some of the iNat project questions changed since it was created so some we need to adjust the answers to be more consistent throughout the project.

Reclassify response category variables
Check for outliers in temperature data
Afternoon
## Warning: Removed 36 rows containing missing values (`geom_point()`).

Should we remove these four outliers with less than -8 dist from mean af?

Morning

Evening

Mutate

Convert Percent to Proportion

Filter

Remove Dead Trees
Subset WA Data
Subset KC Data
Subset HOLC Data

Visualization

Distributions

Number of trees

## Warning in geom_histogram(stat = "count"): Ignoring unknown parameters:
## `binwidth`, `bins`, and `pad`

## Warning in geom_histogram(stat = "count"): Ignoring unknown parameters:
## `binwidth`, `bins`, and `pad`

Response Variable

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The data is likely zero inflated

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

There are some data that need to be cleaned, healthy trees should not have dieback percent of 100.

binomial distributions

Ordinal regression, maybe poisson distribution?

Tree Health Categories

Some cleaning is needed, some healthy observations have dieback and some dead trees have 0% dieback.

Density Plots

Urban Heat

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

Environmental Exposures

Note we need to remove some outliers

## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

Data Summaries

Trees per health category

Binary Response (2 categories)

## # A tibble: 2 × 2
## # Groups:   binary.tree.canopy.symptoms [2]
##   binary.tree.canopy.symptoms     n
##   <fct>                       <int>
## 1 Healthy                       815
## 2 Unhealthy                     424

Filtered response (5 categories)

## # A tibble: 4 × 2
## # Groups:   reclassified.tree.canopy.symptoms [4]
##   reclassified.tree.canopy.symptoms     n
##   <fct>                             <int>
## 1 Healthy                             815
## 2 Thinning Canopy                     165
## 3 Dead Top                            111
## 4 Other                               148

Community scientists

## # A tibble: 146 × 2
## # Groups:   user_login [146]
##    user_login           n
##    <chr>            <int>
##  1 abbielisabeth        4
##  2 abbigail_white       5
##  3 abe                  1
##  4 adrienne_stclair     2
##  5 akreiner             6
##  6 alexis_mushroom     57
##  7 alginger             4
##  8 amy_boucher          1
##  9 and_allies           3
## 10 angela_mabel        50
## # ℹ 136 more rows
##   n.participants n.obs
## 1            146  1239
##   n.participants n.obs
## 1             96   785
## # A tibble: 3 × 1
##   Area       
##   <chr>      
## 1 King County
## 2 Portland   
## 3 Tacoma

Analyses

Dieback and Afternoon Heat

Does tree dieback (proportion/percent dieback) increase with increases in urban heat?

Zero inflated beta regression

Because the response variable is percent (we can consider it as proportion too) we can do regression using a beta distribution (comparing shapes) rather than a linear regression. We also need to use an distribution zero or one inflated rate.

Confirm no co-linearity

Calculate proportion of obs that have zero dieback

Fit models

The original model threw the following error because we had dead trees with proportions of 1.0.

zibeta.daily <- glmmTMB(field.dieback.prop~DN_AF1 + (1|Area),data=data,ziformula=~1,family=beta_family()) Error in eval(family$initialize) : y values must be 0 <= y < 1

One way around is to change 1s to .99 as suggested by Ben Bolker here. However, another option, as noted in the GLMM FAQ (Beta GLMMs section) is to use the brms package because it can handle zero-one inflated models.

Another, possibly more biologically relevant, option is to remove the dead trees from the analysis. This may make the most sense becuase we’re not sure if why they died.

Dead trees were excluded from below models

##  Family: beta  ( logit )
## Formula:          field.dieback.prop ~ dist.from.mean.daily + (1 | Area)
## Zero inflation:                      ~dist.from.mean.daily
## Data: data
## 
##      AIC      BIC   logLik deviance df.resid 
##   1130.8   1161.5   -559.4   1118.8     1233 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  Area   (Intercept) 0.02493  0.1579  
## Number of obs: 1239, groups:  Area, 3
## 
## Dispersion parameter for beta family (): 2.32 
## 
## Conditional model:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -1.20299    0.11043 -10.894   <2e-16 ***
## dist.from.mean.daily  0.05940    0.03501   1.697   0.0898 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Zero-inflation model:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           0.57832    0.05956    9.71  < 2e-16 ***
## dist.from.mean.daily -0.13001    0.03988   -3.26  0.00111 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In general, we know temperature is relevant for predicting percent dieback.

Compare models

##              dAIC df
## zibeta.af     0.0 6 
## zibeta.daily 16.2 6 
## zibeta.pm    24.9 6 
## zibeta.am    29.2 6

Lowest score is most preferred model. Difference in AIC scores can be intrepreted as ‘extra information lost’ by using the worse model in comparison to the better model.

The model with AF temperatures was the best fit.

##  Family: beta  ( logit )
## Formula:          field.dieback.prop ~ dist.from.mean.af + (1 | Area)
## Zero inflation:                      ~dist.from.mean.af
## Data: data
## 
##      AIC      BIC   logLik deviance df.resid 
##   1114.6   1145.3   -551.3   1102.6     1233 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  Area   (Intercept) 0.03499  0.187   
## Number of obs: 1239, groups:  Area, 3
## 
## Dispersion parameter for beta family (): 2.34 
## 
## Conditional model:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -1.22920    0.12552  -9.793   <2e-16 ***
## dist.from.mean.af  0.06471    0.02567   2.521   0.0117 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Zero-inflation model:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        0.57931    0.05986   9.677  < 2e-16 ***
## dist.from.mean.af -0.11940    0.02504  -4.769 1.85e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The conditional output is indicating that for all observations with greater than 0 dieback proportions, distance from mean am influences the proprtion of dieback. The zi model tells us which predictor increases the propability of non-zero. > interesting that the distance from mean am has a negative estimate though - hard to know if it is observations higher than the mean or lower than the mean (ie as temp increases, distance from mean decreases (but higher or lower than mean))

Including a predictor in the zi= bit of the model would test whether heat influences whether there is dieback at all.

Possible understanding from: https://stats.stackexchange.com/questions/466047/interpreting-output-for-glmmtmb-for-zero-inflated-count-data

Answers reference ‘pages 382-383 explain all components of the model summary:’

Brooks, M. E., Kristensen, K., van Benthem, K. J., Magnusson, A., Berg, C. W., Nielsen, A., Skaug, H. J., Machler, M., & Bolker, B. M. (2017). glmmTMB balances speed and flexibility among packages for Zero-inflated Generalized Linear Mixed Modeling. The R Journal, 9(2), 378-400. https://doi.org/10.32614/RJ-2017-066

Confirm model is zero inflated

## 
##  DHARMa zero-inflation test via comparison to expected zeros with
##  simulation under H0 = fitted model
## 
## data:  simulationOutput
## ratioObsSim = 0.99986, p-value = 1
## alternative hypothesis: two.sided

Check for dispersion

Check if it can cope with zeros

Environmental Health Disparities

Temperature and EHD Rank

##  Family: beta  ( logit )
## Formula:          
## field.dieback.prop ~ dist.from.mean.af + Environm_1 + (1 | Area)
## Zero inflation:                      ~dist.from.mean.af + Environm_1
## Data: wa.data
## 
##      AIC      BIC   logLik deviance df.resid 
##    754.3    791.6   -369.1    738.3      779 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  Area   (Intercept) 0.019    0.1378  
## Number of obs: 787, groups:  Area, 2
## 
## Dispersion parameter for beta family (): 2.34 
## 
## Conditional model:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -1.22708    0.23968  -5.120 3.06e-07 ***
## dist.from.mean.af  0.11767    0.04434   2.654  0.00795 ** 
## Environm_1         0.01394    0.02707   0.515  0.60644    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Zero-inflation model:
##                   Estimate Std. Error z value Pr(>|z|)
## (Intercept)        0.23445    0.25794   0.909    0.363
## dist.from.mean.af -0.06615    0.05451  -1.214    0.225
## Environm_1         0.00023    0.03302   0.007    0.994

Temperature and Poverty

##  Family: beta  ( logit )
## Formula:          field.dieback.prop ~ dist.from.mean.af + Poverty + (1 | Area)
## Zero inflation:                      ~dist.from.mean.af + Poverty
## Data: wa.data
## 
##      AIC      BIC   logLik deviance df.resid 
##    753.2    790.5   -368.6    737.2      779 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  Area   (Intercept) 0.02134  0.1461  
## Number of obs: 787, groups:  Area, 2
## 
## Dispersion parameter for beta family (): 2.35 
## 
## Conditional model:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -1.216093   0.165280  -7.358 1.87e-13 ***
## dist.from.mean.af  0.108746   0.045386   2.396   0.0166 *  
## Poverty            0.004307   0.004919   0.876   0.3812    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Zero-inflation model:
##                    Estimate Std. Error z value Pr(>|z|)
## (Intercept)        0.135740   0.144838   0.937    0.349
## dist.from.mean.af -0.075528   0.055810  -1.353    0.176
## Poverty            0.004635   0.005813   0.797    0.425

HOLC Data

##  Family: beta  ( logit )
## Formula:          field.dieback.prop ~ dist.from.mean.af + grade + (1 | Area)
## Zero inflation:                      ~dist.from.mean.af + Poverty
## Data: holc.data
## 
##      AIC      BIC   logLik deviance df.resid 
##    258.2    295.4   -119.1    238.2      295 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance  Std.Dev. 
##  Area   (Intercept) 2.507e-10 1.583e-05
## Number of obs: 305, groups:  Area, 2
## 
## Dispersion parameter for beta family (): 2.57 
## 
## Conditional model:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -1.30573    0.30234  -4.319 1.57e-05 ***
## dist.from.mean.af  0.12976    0.07555   1.718   0.0859 .  
## gradeB            -0.23018    0.34161  -0.674   0.5004    
## gradeC             0.24652    0.32804   0.751   0.4524    
## gradeD            -0.18088    0.41120  -0.440   0.6600    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Zero-inflation model:
##                    Estimate Std. Error z value Pr(>|z|)
## (Intercept)        0.010073   0.264888   0.038    0.970
## dist.from.mean.af -0.149360   0.091695  -1.629    0.103
## Poverty            0.014059   0.009987   1.408    0.159

Discussion

Most important temperature time

Which temperatures (morning, afternoon, eve) are most important for determining tree health?

Dieback and Pollution

Does dieback increase with increases in pollution?

Which pollution data are important for determining tree health?

Are Temperatures higher for trees in urban areas of King County?

References